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Abstract 

In this paper we propose a vision system that performs image Super Resolu- 
tion (SR) with selectivity. Conventional SR techniques, either by multi-image 
fusion or example-based construction, have failed to capitalize on the intrinsic 
structural and semantic context in the image, and performed "blind" resolu- 
tion recovery to the entire image area. By comparison, we advocate example- 
based selective SR whereby selectivity is exemplified in three aspects: region 
selectivity (SR only at object regions), source selectivity (object SR with 
trained object dictionaries), and refinement selectivity (object boundaries 
refinement using matting). The proposed system takes over-segmented low- 
resolution images as inputs, assimilates recent learning techniques of sparse 
coding (SC) and grouped multi-task lasso (GMTL), and leads eventually to 
a framework for joint figure-ground separation and interest object SR. The 
efficiency of our framework is manifested in our experiments with subsets of 
the VOC2009 and MSRC datasets. We also demonstrate several interesting 
vision applications that can build on our system. 

Keywords: image super resolution, semantic image segmentation, vision 
system, vision application 



1. Introduction 

Super-resolution image reconstruction is the process to recover a high- 
resolution image from a single or multiple low-resolution input images pp. 
In frequency domain, this corresponds to resolving the beyond-Nyquist high- 
frequency components from the aliased version of the spectrum [2]. Ap- 
parently SR problem is under-determined by its nature, because practi- 
cally many high-resolution images can produce the same low resolution im- 
age(s). Therefore it comes without surprise that the extensive research on SR 
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Figure 1: Visual applications of our selective SR system. (From left 
to right) zoom blurring, object pop-up, and image composition. (For better 
view, please refer to the electronic version) 

has worked on providing additional constraints and/or incorporating various 
prior knowledge. Accordingly existing SR techniques can be broadly clas- 
sified into two categories: 1) the classical multi-image fusion, and 2) the 
sophisticated example-based construction. 

Multi-image fusion techniques normally require multiple images of the 
same scene with subpixel relative displacements as inputs. When sufficiently 
many such images are provided, direct SR reconstructions are feasible in both 
spatial and frequency domains via solving systems of linear equations. Even 
insufficient, these input images can be incorporated into explicit regulariza- 
tion frameworks with various kind of prior knowledge (mostly smoothness). 
A good review of all these techniques is provided in pQ. This family of SR 
algorithms are attractive due to the simplicity of algorithmic implementa- 
tion, and the ease of multi-camera imaging and video capturing as inputs. 
As demonstrated by many authors [3J H] , however, fusion-based SR can only 
provide numerically less than double magnification factor. This has severely 
limited their use for many applications. 

This limitation has been broken since the introduction of example-base 
SR techniques (or Image Hallucination) [31 [5]. Techniques in this vein fea- 
ture learning low- and high-resolution image patches from a collection of low- 
and high-resolution image pairs. Upon completion of the learning phase, 
early developed algorithms (e.g. [21 El El El) involve finding matched high- 
resolution patch for each low-resolution input and simply taking the high- 
resolution as the recovered. These selection-based methods normally entail 
large training datasets to ensure the desired high-resolution patches can be 
best approximated by existing training patches. By comparison, recent devel- 
opments (c.f. [El El HO]) have introduced additional flexibility and scalability 
by treating the patch generation process as regression problems over properly 
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Figure 2: Overview of our selective image SR system. Given a low- 
resolution input image, our system first performs over-segmentation, and 
then employs GMTL (based on pre-trained dictionaries) to decide whether 
each segment belongs to the object region, leading to figure-ground separa- 
tion. Image matting is used to refine the separation. It finally constructs 
super-solved version of the object region with the object dictionaries, and 
other special effects (zoom blur, part emboss) can be implemented by pro- 
cessing either the foreground or background region. 
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trained/selected basis patches. This innovation has made SR accessible to a 
wide range of applications with training sets of manageable sizes. 

Despite the credible success SR research has made over the decades, it 
is misfortune of SR being treated isolated as a simple image enhancement 
process. In view of the integral nature of vision research, it is reasonable 
that one starts to link SR with other computer vision tasks, such as fig- 
ure/ground separation, weak object identification. In fact, it has been re- 
alized before that "blind" SR reconstruction to the whole image area may 
not work well, in terms of e.g. resulting in over-smooth edges and corners 
[TT] . To mitigate the adverse effect, SR algorithms may be made adaptive 
to specific image regions. In other words, the intrinsic image structure (such 
as edges, region segments) and even the semantic context of image regions 
(such as spatial layout, figure/ground separation, localized labels for regions) 
need to be explicitly accounted for. This coincides with the ongoing theme 
of recognition-oriented vision research, namely building the synergy between 
shape and appearance modeling (e.g. epitomic shape and appearance mod- 
eling [12] and region-based recognition p3]). Furthermore, the coincidence 
sheds light on integration of SR reconstruction with other typical vision tasks, 
such as segmentation, recognition. 

Moving towards this direction, we propose an image SR system with se- 
lectivity. Compared to conventional SR algorithms, selectivity is exercised in 
three modules of the system: 1) perform SR only at object region(s) and blur 
out the background region (region selectivity), 2) generate super-resolved 
patches from training dictionaries of objects rather than background (dictio- 
nary selectivity), and 3) refine the figure/ground boundaries with alpha mat- 
ting techniques [T4T, ITS] (refinement selectivity). Three levels of selectivity 
has been facilitated by neat integration/adaptation of several state-of-the- 
art techniques, including image over-segmentation, sparse coding dictionary 
learning, multitask lasso, and image matting. Overall, the proposed system 
is able to separate the foreground objects out of the image region, perform 
SR to the object region and yield visually pleasing results after matting and 
other simple post-processings. Figure [T] provides three special visual effects 
for a single input low-resolution image. Figure [2] gives an overview of our 
system pipeline and various techniques involved. 
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2. Related Work 



The current work follows the line of example-based SR, which dates back 
to the seminal papers [51 H] . In these pioneering works, techniques such as 
Markov random field (MRF) with belief propagation (BP) and hierarchical 
nearest neighbor matching were used to establish the low- to high-resolution 
patch correspondences. Thereafter, [TT] introduced the primal sketch priors 
to improve the reconstruction at the blurred edges, ridges, and corners. All 
these early example-based SR techniques assume that the recovered patches 
are to be selected from the training datasets. This inevitably requires a large- 
scale datasets for satisfactory patch approximation and hence reconstruction. 
This limitation of the selection paradigm triggers the development of the re- 
gression paradigm, in which novel patches can be generated from a (usually 
linear) combination of the existing. In this aspect, [8] and [10] used re- 
spectively a linear combination of several nearest neighboring patches and 
a sparsest combination of training patches to approximate the input low- 
resolution patch, and hence the low- and high-resolution correspondences. 
Our work is inspired by [10] which applied sparsity priors to SR. 

Sparse coding traces its root in signal processing and compression^ and 
is very recently introduced to the vision community for face recognition [TO] 
and many other vision tasks [17J. Recent research in learning techniques 
has extended sparse coding to multiple-task or multi-label decision scenarios 
[TBI IT9| [20] . whereby sparsity is exploited for feature selection and seman- 
tic inference. Another direction of extension is on unsupervised or weakly 
supervised structure discovery or over-complete coding dictionary learning 
[2"Tl |2"21 |2"5] . These formulations explicitly require learned visual structures 
or patterns (encoded as visual dictionary parallel with the popular "bag-of- 
features" technique) facilitate sparse reconstruction. The current work builds 
on both extensions and tunes them to our selective SR applications. 

The theme of joint vision problem solving exhibited by the current work 
is not novel. In recognition-oriented vision research, possibilities and advan- 
tages of joint visual object segmentation, detection, and recognition and anal- 
ysis have been suggested in many works, e.g. the most recent ones [13], EI]. 
Our proposal of simultaneous object /background (or simply figure/ground) 
separation and SR reconstruction moves along the same line, hoping to dis- 



1 Hence it is also termed as "compressive/compressed sensing" in the signal processing 
community. 
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cover new possibilities. Our empirical results seem to provide us with positive 
answers. 

Alpha matting is used for accurate extraction of image foreground with 
transparent boundaries [U [15]. One of the great challenges in matting is 
to obtain an initial figure-ground separation map (the "trimap") of decent 
accuracy. The output from figure-ground separation of our algorithm (from 
the solution to the GMTL as explained later) has fractional accuracy and can 
be used for this purpose. Matting is used to fine-tune the boundary regions 
of our highlighted foreground object. 

3. Image SR with Selectivity 

Our selective SR system stems from the confluence of computer vision, 
machine learning and graphics techniques. Amongst them, the hardcore 
mid-level vision research in image segmentation, developments and applica- 
tions of sparse coding and its derivatives such as the multi-task Lasso and 
(sparsity-induced) dictionary learning, and the extensively researched alpha 
matting process, are integrated into our system. This part will overview 
these techniques and provide some necessary details, following the pipeline 
of our framework (as in Figure [5]). 

3.1. Image Segmentation 

Image segmentation has been one of the central topics in mid-level com- 
puter vision, accounting for "visual grouping" advocated by the Gestalt 
school of visual perception. Literally, image segmentation is the partitioning 
of an image into coherence groups of pixels, such that each group corresponds 
to semantic-level objects or parts of objects. The criterion for grouping is 
normally based on multiple image cues, such as pixel intensity, color, tex- 
ture, motion as bottom- up driving force and categorical prior knowledge for 
top-down modulation. 

Albeit the extensive research on image segmentation ever since the early 
vision days and the general belief that recognition and alike high-level vision 
tasks should be founded on segmentation, incorporating segmentation with 
other high-level vision tasks has not seen much success. This is in part 
because of the lack in a reliable and efficient segmentation algorithm up to 
date, and fundamentally determined by the fact that visual segmentation is 
inherently hierarchical and task-driven. Whereas object-level segmentation 
in general relies also on the top-down semantic-level knowledge, part-level 
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segmentation tends to be much easier as it is mainly driven by bottom-up 
low-level features. Hence the latter is much better researched and normally 
leads to "over-segmentation". 

For our application, over-segmented image regions as the unit blocks for 
figure-ground analysis carries over the structural and patch contextual in- 
formation. We assume homogeneity within each segmented region and the 
general consistency in choosing source patches for SR reconstruction. For 
implementation, we use the graph-based image segmentation of [25]. This 
algorithm uses very simple and intuitive measure for the local evidence of 
boundaries, and makes greedy decisions of segmentation that respect global 
structures and properties. Moreover, this algorithm has approximately linear 
complexity O (nlogn) w.r.t. the number of edges, where n is the number of 
image pixels (the graph is not fully connected). 

3.2. Sparse Coding and Dictionary Learning 

The input low-resolution image and its over-segments obtained above will 
work with dictionaries trained this part towards figure-ground separation via 
GMTL. The dictionary training is closely related to the sparse coding tech- 
nique. In signal processing, sparse coding refers to the problem of encoding 
signals (vector y) as sparse (few nonzero coefficients) as possible, over known 
or unknown basis (matrix D G IR pxn , every column a p-dimensional basis vec- 
tor). The basis is usually overcomplete, meaning n ^> p. If D is known, the 
problem is normally formulated as locating the sparest coefficient vector a* 
from the combinatorial optimization^] 

min||ai||o, s.t. y = Da. (1) 

The optimization is NP-hard, and in practice for ex with notable sparsity, it 
can be relaxed as a l\ convex optimization problem 

min||at||i, s.t. y = Da. (2) 

ct 

To account for practical noisy cases, the equality constraint is normally re- 
laxed as ||y — Dck||| < e, or in its Lagrangian form as 

min + A||y — Dck|||, (3) 



2 1| • || o is a pseudo-norm, and simply counts the number of nonzero elements in a vector. 
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which is commonly known to the statistical learning community as Lasso and 
can be solved efficiently using the popular Least Angle Regression (LAR, [27J) 
method or the stochastic cyclic coordinate descent method [28]. Recently 
there is growing interest in the cases of sparse coding without known basis. 
Intuitively, learning the basis from the data can create more specialized basis 
system, and also reveals the underlying structure and helps discover proto- 
typical samples from the data source alone (example applications in [2T]). 
This is termed as "(sparsity-induced) dictionary learning" . Formally, to ex- 
tract a dictionary D G W xn of size n from m data samples {y«}™ 1 , y« G W 
(n < m), the following optimization problem is to tackle 

m 

r- 1 ?! 11 ™ + A llyi- D5 illi) • ( 4 ) 

i«ih=i. D i= i 

The objective is convex with respect to each group of the variables, but not 
simultaneously. Hence one possible solution scheme is to alternate between 
the dictionary learning, i.e. solving for D given {o:j}™ 1 , and the converse in 
the consequent step. This is exactly the strategy suggested in [21]. In our 
application, we find the online dictionary learning algorithm proposed in [22] 
much more efficient. This online solution is based on sequential solutions to 
quadratic local approximation of the objective function and has proven con- 
vergence. Our dictionary learning involves object images of 5 known semantic 
classes and their pixel-level segmentation (detailed in experiment part). Im- 
age patches (typically of size 3 x 3 or 5 x 5 pixels) are randomly sampled from 
both object regions and their corresponding backgrounds. Sampled patches 
are used for training a pair of sparsity-induced dictionaries of a particular 
object class and the related backgrounds, respectively. For each dictionary, 
high-resolution patches and their corresponding sub-sampled low-resolution 
patches are trained together (by concatenating them and properly scaling 
each), to ensure the consistency between the high- and low-resolution patch 
basis (as was done in [TD J ) . These class-specific dictionaries with figure- 
ground distinction essentially capture the contextual relationship and occur- 
rences of object and their surroundings, and can help provide discrimination 
between object and background regions, as what follows. 

Figure [3] shows the foreground and background dictionaries for a partic- 
ular semantic class, and their distributions over 20-bins. Note that these 
two dictionaries are not dramatically different. This is not surprising since 
the dictionary learning process tends to produce a generic basis, as noted by 
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Figure 3: Sparsity-induced dictionaries trained from class-specific 
data. The statistics are obtained by clustering these basis vectors (dictionary 
patches) into 20 groups, and counting the fore- and back-ground patches 
within each group. The curves fit to these two sequences at doubled scale 
for clarity. 
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e-<7- [ZE] and many others. Nevertheless, the local variations between the 
foreground and background distributions still exhibit notable difference. 

3.3. Grouped Multitask Lasso and Figure-Ground Separation 

Over-segmentation and dictionary learning essentially provide input for 
multitask Lasso in this part. Sparsity related techniques, such as Lasso in 
Eq. (|3]), are normally useful for identifying a subset of most relevant features 
amongst a redundant collection (shortly known as feature selection). Recent 
developments have extended Lasso into grouped setting (the Group Lasso 
[15] . or GL), and multitask setting (the MultiTask Lasso [181 ED], or MTL). 
GL is designed to achieve group sparsity of variable selection with respect to 
some pre-defined variable groups, and is formulated as 

G 

min ||y - Ba\\l + A \\a x J 2 , (5) 

9=1 

where X g is the index set of variable(s) belonging to the g th group, for g = 
1, • • • , G. On the other hand, MTL aims to obtain the same sparsity pattern 
across tasks. Assuming we are considering K tasks together, and superscript 
(such as yw) identifies a particular task. We have the objective 

K n n 

m a n ^ii y(fc) -E^ )d 5 fc) ii' +A 5:i^ii- ^ 

k=l j=l j=l 

where is the j th basis vector for the k th task, and n is the number of 
features (size of the dictionary) as defined in last part. Moreover, u>j = 

(u^, ■ ■ ■ \ is the vector of all coefficients for the j th feature across 

different tasks, and Q = (u?i, ■ ■ ■ , cj„) t . In jTH [20], the g-norm for the coef- 
ficients is taken as the sup-norm, i.e. \\&j\\oo = max^ We argue that 
other valid norms can also be taken, esp. the i 2 norm. Taking summation 
of £2 norm as the penalization term not only effectively helps combine MTL 
and GL as can be shortly seen, but results in considerable computational 
saving as discussed below. 

For our particular problem, we want to use the same dictionary D = 
(di, • • • , d n ) (concatenation of dictionaries of different object classes, includ- 
ing both foreground and background ones) learned from last part, and recon- 
struct local patches within each segmented region simultaneously to identify 
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if the region belongs to the object or the background. Hence we drop the 
superscript for the dictionary straight away and meanwhile choose the £2 
norm. Then the objective we need to optimize reduces to 

K n n 

min^llyW-^^i + A^KIb. (7) 

fc=i j=i j=i 

If we further enforce group sparsity as in GL, across tasks our group re- 
construction coefficients will take sub-matrices of the matrix Q. Hence the 
ultimate formulation will be 

K n G 

™Eiiy w -E w i fc)d iiil+ A Eii^ii^ ( g ) 

k=l j=l 9 =1 

where \\ ■ \\f is the Frobenius norm for matrices, and X g takes similar roles as 
in Eq. (p)L 

Eq. ([8]) can be solved by (batch-mode) clockwise coordinate descent as 
in [20], with a considerably large number of iterations. Instead, we turn the 
regularization into a constraint and arrive at 

K n G 

™ E \\y {k) - E^iiG B-t. E 11^11* < c ( 9 ) 
k=i j=i 9=1 

where C is a constraint parameter dual to the original regularization param- 
eter A. By employing matrix notations for Y = (y^, ■ • • , y^) in addition 
to D and f2, the objective can be written as 

G 

min fJ T D T Dfi - 2Y T DfJ s.t. ^ ||^zJ|f < C. (10) 

9=1 

The following definitions and proposition will be important for numerically 
solving the optimization. 

Definition 1 (Mixed £ P!g -Norm). For a vector x G lR n and a set of disjoint 
index set {Z g } g=1 such thatU g X g = {1, • • • , n}. The t VA -norra for x is defined 

as ||x|| Pi g = [z2g Il x x 9 ||g ) ; where xj g is the tuple consisting of the elements 
over indexes X g . 
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Definition 2 (£ P)(? -Norm Balls). C = {x|||x|| Pj9 < r} is the £ p ^ q -norm ball of 
radius r. 



Proposition 3 (Projection onto an £ 12 -Norm Ball of Radius r [2H]). For 
a vector x G 

{!,-•• ,n}. 1 
t is given by 



a vector x G IR™ and a sei o/ disjoint index set {X 9 }^ =1 stzc/i i/ia£ U 9 X 9 = 
{1, • • • , n}. The Euclidean projection Vc (x) onto the ii^-norm ball of radius 



xx s = sgn (xrj max (0, ||x Xg || 2 - A) , (11) 
where sgn (z) is the signum function defined over the vector z as sgn (z) = 



z/||z|| 2; and A recursively defined over subset of the index set {X 9 }^ =1 as X- 



A 



{j G (!,-•• , G) | ||xx^ H2 > A} and the constraint that (Hxr,- 1 1 2 — A ) 



T. 



A in the proposition can be solved efficiently [30], and hence the pro- 



jection. The £ 12 -constrained quadratic optimization in Eq. (10) has simple 
analytic gradients, and the constraint is a £i )2 -norm ball with projection rule 
as discussed above. Hence we can employ the projected gradient method for 
convex optimization [31] as described in Algorithm [I] 

Algorithm 1: Group- Mult it ask Lasso Algorithm 

Given D, Y, C, {T g } G g=v rj. Set n<°), k <- 
while not Converged do 
// gradient descent 

// vectorize submatrices 
for g — 1 to G do 

f3 g = vectorize ^f2^ +1 ^J 

/3=(/3f;...;/3§) 
// projection 
Solve A 

for g = 1 to G do 

/3 5 = sgn (fi g ) max (0, \\(3 g \\ 2 - A) ; 
= devectorize (j3 g 
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The convergence is typically within 50 iterations with the above projected 
gradient method. Upon completion, we use the results to perform figure- 
ground separation. To this end, both the reconstruction error and the recon- 
struction coefficients can be used. We find slightest difference for them to 
distinguish between the object and background, and hence we stick to the re- 
construction coefficients for simplicity. We directly compare the sum of recon- 
struction coefficients within each semantic group (foreground/background) to 
arrive at the figure-ground separation. We note that we have scaled the dic- 
tionary elements such that every feature vector has unity £2 norm, and hence 
there should not be cross-scale problem associated with the reconstruction 
coefficients. 

Figure [4] presents one group of example results on figure-ground separa- 



tion. Notice that the separation produced by this GMTL step (Figure 4c ) is 
not perfect, and is often dependent on the over- segment at ion quality. Never- 
theless, as compared to the results produced by patch-wise individual Lasso 



(Figure 4d), the former results are much better. This has illustrated the 



benefit of using GMTL for discrimination. 

3.4- Image Matting and Further Processing 

Image matting as mentioned before is widely used for image editing and 
other arts production applications. Mathematically, matting involves the 
simultaneous estimation of the foreground image F z (z denotes the pixel 
position) and the background image B 2 together with the alpha matte a z , 
given the observed image I z , and the matting equation 

I z = a z F z + {l-a z )B z . (12) 

Matting is a typical inverse problem, and need additional constraints or regu- 
larization to be solvable. Most existing technique requires significant manual 
inputs which is undesirable for our current work. Several recent algorithms 
need only sparse user scribbles as inputs and even provide closed-form solu- 
tions [i~5] . 

We choose to use [15] to refine the figure-ground separation map obtained 
from GMTL, and treat the map (fractional-values at the boundaries) as the 
scribbled sparse alpha matte. The effectiveness of this novel employment 
of matting is confirmed by our empirical results (in experiment part). The 
solution of GMTL for each segment region can be used for SR reconstruction. 
In addition, we observe that following the patch-wise sparse reconstruction as 
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(a) Input Image 



(b) Over-Segmentation 




(c) FG Map-MTL (d) FG Map-Lasso 



Figure 4: The figure-ground map produced by segmentation-based 
GMTL vs. segmentation-based voting of patch-wise Lasso. GMTL- 
based method produces better results. 
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discussed in [TU] provides additional performance gains. Furthermore, other 
image processing techniques can also be applied to the background region, 
leading to various visual applications (as shown in Figure [l] and the upcoming 
Figure [5J . 



4. Experiments and Discussions 

4-1. Dataset Preparation 

We select images of five object categories: cow, horse, sheep, cat, and 
dog from the VOC2009 segmentation dataset f\ and the MSRC object class 
recognition^] dataset (version 2), respectively. These datasets are suitable for 
our purpose of object/background dictionaries training, because they provide 
pixel-level object/background segmentation groundtruth. We choose animal 
images to work with more diversity in textures. For each selected dataset, 15 
images (about 10% of the total) across all 5 categories are used for testing, 
and the remaining for training. 

Each training image and its down-sampled version (by the desired mag- 
nification factor, typically 3) constitute a high-/low-resolution image pair. 
50,000 patches (with typical size 3x3 pixels w.r.t. the low-resolution im- 
age) for the object and the background respectively are then sampled from 
the training pairs, with the aid of the available groundtruth segmentation. 
For patch representation, we follow pU] and use the first-order and second- 
order derivatives as features. The 1-D filters used for feature extraction are 



fx = [-1, 0, 1] , f 2 = f x T , f 3 = [1, 0, -2, 0, 1] , f 4 = f; 
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Joint dictionary learning as described in Sec. |3.2| is then performed for each 
category class and its corresponding background, over the sampled patches. 
Each dictionary contains 1024 basis patches. 

4-2. Figure- Ground Separation and Matting 

Based on the learned dictionaries, and the over-segmentation for an input 
image, the GMTL algorithm figures out the figure-ground separation based 
on the reconstruction coefficient vectors for each image segment. Several 



3 http : //pascallin . ecs . soton . ac . uk/challenges/VDC/voc2009/ 




4 http : //research. microsoft . com/ en-us/projects/ objectclassrecognition/ 
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example output maps by this procedure is included in Figure [B] (Group 
For comparison, we have also generated maps based on the voting of patch- 
wise reconstruction coefficients within each segment (single Lasso over each 
patch, and then voting within a segment, Figure [5j Group B). It is obvious 
that often GMTL produces more reliable object regions than the other way. 
This success is largely due to the joint solution to figure-ground separation 
within one segment, using the proposed GMTL technique. On the other 
hand, even GMTL often produces object map with fragmented boundaries 
or parts as evident from the examples. This is where matting comes into play. 
We apply our matting scheme as discussed in Sec- 3.4 (results in Figure |5j 



Group D). Visual investigation suggests that matting does enhance the 
object boundaries much, notably at regions e.g. the cow horns, the dogs' 
bodies and heads. 

4-3. SR and Other Visual Applications 

The SR reconstruction is hence based on the matted map instead of the 
original. Figure [5] compares several of the low-resolution textured patches/regions 
with that produced by our SR (Group H vs. J). The reconstructed ones 
often contain significant more details than the original. Moreover, several 
possible visual effects by further processing the backgrounds or /and the ob- 
jects are shown in Figure [5] (Group E and F). 



5. Conclusions and Future Work 

In this paper we employ and integrate several state-of-the-art methods 
in recent vision, learning, and graphics research, and build an SR system 
with selectivity that effectively jointly solves figure-ground separation and 
SR reconstruction. It is exciting to work along the classic over- segment at ion 
algorithm with the sophisticated sparse coding and multitask Lasso tech- 
niques to achieve learning-based figure-ground separation. Equally exciting 
is the matting technique from graphics research that can effectively enhance 
the separation, and help us generate good SR reconstruction and fancy vi- 
sual applications. We plan to further investigate the possibility of generic 
semantic class identification with the same setting. 



5 Please refer to the legend in Figure [5] and the caption therein for details about the 
groups. 
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Figure 5: Selective SR and various visual applications. These are four 
comprehensive examples for demonstrating the system with the notations. 
G: input image, A: Over-segmentation map, B\ patch-wise figure-ground 
voting map within segments, C: GMTL-based figure-ground map, D: C 
after matting, E and F: special visual effects, G and F: original patch 
and SR reconstructed patch. (For better view, please refer to the electronic 
version. Please zoom in to see the special effects.) 
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